Stochastic attributed K-d tree modeling of technical paper title pages

نویسندگان

  • Song Mao
  • Azriel Rosenfeld
  • Tapas Kanungo
چکیده

Structural information about a document is essential for structured query processing, indexing, and retrieval. A document page can be partitioned into a hierarchy of homogeneous regions such as columns, paragraphs, etc.; these regions are called physical components, and define the physical layout of the page. In this paper we develop a class of models for the physical layouts of technical paper title pages. We model physical layout using hidden semi-Markov models for directional projections of page regions, and a stochastic attributed K-d tree grammar model for the 2D hierarchical structure of these regions. We use the models to generate sets of synthetic title page images of three distinctive styles, which we use in controlled experiments on page structure analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling

Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars h...

متن کامل

Confidence Interval Estimation of the Mean of Stationary Stochastic Processes: a Comparison of Batch Means and Weighted Batch Means Approach (TECHNICAL NOTE)

Suppose that we have one run of n observations of a stochastic process by means of computer simulation and would like to construct a condifence interval for the steady-state mean of the process. Seeking for independent observations, so that the classical statistical methods could be applied, we can divide the n observations into k batches of length m (n= k.m) or alternatively, transform the cor...

متن کامل

Modeling with extended fault trees

In the areas of both safety and reliability analysis the precise modeling of complex technical systems during development and for evaluation purposes is of great importance. Traditionally, fault tree models have been used to accomplish this, and, more recently, stochastic Petri-net models have begun to be employed. To provide engineers with an intuitive high-level modeling interface to Petri-ne...

متن کامل

An efficient algorithm for finding the semi-obnoxious $(k,l)$-core of a tree

In this paper we study finding the $(k,l)$-core problem on a tree which the vertices have positive or negative weights. Let $T=(V,E)$ be a tree. The $(k,l)$-core of $T$ is a subtree with at most $k$ leaves and with a diameter of at most $l$ which the sum of the weighted distances from all vertices to this subtree is minimized. We show that, when the sum of the weights of vertices is negative, t...

متن کامل

Multi-horizon stochastic programming

@article{KautEA12, author = {Michal Kaut and Kjetil T. Midthun and Adrian S. Werner and Asgeir Tomasgard and Lars Hellemo and Marte Fodstad}, title = {Multi-horizon stochastic programming}, journal = {Computational Management Science}, year = {2014}, volume = {11}, number = {1--2}, pages = {179--193}, note = {Special Issue: Computational Techniques in Management Science}, doi = {10.1007/s10287-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003